RDD Examples

Find the sum of List Value
val rdd  = sc.parallelize(List("animal", "human", "bird", "rat"))
val rdd1=rdd.map(x =>(x.length))
rdd1.reduce(_+_)

 
Word count program
object WordCount {
       def main(args: Array[String]) {

          println("In main : " + args(0) + "," + args(1))

          //Create Spark Session
          val spark = sparkSession.builder().setAppName("WordCountApp")
                            .config("spark.some.config.option", "some-value")
                            .config("spark.sql.warehouse.dir", warehouseLocation)
                            .enableHiveSupport()
                            .getOrCreate()

          //Load Data from File
          val input = sc.textFile(args(0))

          //Split into words
          val words = input.flatMap(line => line.split(" "))

          //Assign unit to each word
          val units = words.map ( word => (word, 1) )

          //Reduce each key
          val counts = units.reduceByKey ( (x, y) => x + y )

          //Write output to Disk
          counts.saveAsTextFile(args(1))

          //Shutdown spark. System.exit(0) or sys.exit() can also be used
          sc.stop();
       }
    }

Join two files read with textFile()

Read textFile 1
  A B
  1 2
  3 4
val txt1=sc.textFile(path="/path/to/input/file1")

Read textFile 2
  A C
  1 5
  3 6

val txt2=sc.textFile(path="/path/to/input/file2")

Join and print the result.

txt1.join(txt2).foreach(println)

  A B C
  1 2 5
  3 4 6
The join above is based on the first column.

No comments:

Post a Comment